Conversation
Distinguish the models used in the executor and evaluator
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
…s/sysmobench/sysmobench_core'
- Add gpt-4o model configuration to models.yaml - Fix setup_tools.py to use shutil.move instead of os.rename This resolves 'Invalid cross-device link' error when /tmp is on different filesystem
Signed-off-by: Tarek <tareknaser360@gmail.com>
* added cmu15-213 data lab * docs(courselab): add note about infrastructure restrictions Signed-off-by: Tarek <tareknaser360@gmail.com> --------- Signed-off-by: Tarek <tareknaser360@gmail.com> Co-authored-by: Tarek <tareknaser360@gmail.com>
* add cs537 fall 2021 final exam * add institution * fix * add solutions * update metadata * add choice array * avoid extra restrictions on LLM output Signed-off-by: Tarek <tareknaser360@gmail.com> --------- Signed-off-by: Tarek <tareknaser360@gmail.com> Co-authored-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
tareknaser
left a comment
There was a problem hiding this comment.
Thanks for the great work. This looks almost ready to merge. I made a few small updates including adding a course entry and a reference solution (based on Claude’s trajectory) and rebasing on top of main. I’ll add a couple more minor updates in separate comments for you to review.
If everything looks good, we can go ahead and merge
There was a problem hiding this comment.
Do you think we can simplify this file to be
#!/bin/bash
set -e
echo "=== Setting up CMU 15-445 CountMinSketch Lab ==="
cd /workspace
echo "Installing git"
apt-get update > /dev/null 2>&1
apt-get install -y git > /dev/null 2>&1
echo "Cloning bustub repository"
git clone https://github.com/cmu-db/bustub.git /tmp/bustub > /dev/null 2>&1
git -C /tmp/bustub checkout bd3912741c45370d5f9c7bef638452b10b140138 > /dev/null 2>&1
echo "Moving source to workspace"
mv /tmp/bustub/* ./
mv /tmp/bustub/.clang-format ./ 2>/dev/null || true
mv /tmp/bustub/.clang-tidy ./ 2>/dev/null || true
rm -rf /tmp/bustub .git
echo "Installing build dependencies"
build_support/packages.sh -y > /dev/null 2>&1
echo "Creating checksums for protected files"
mkdir -p /tmp/checksums
sha256sum test/primer/count_min_sketch_test.cpp > /tmp/checksums/test.sha256
echo "Building project"
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Debug .. > /dev/null 2>&1
make -j$(nproc) > /dev/null 2>&1
echo "Setup complete"
echo "Agent should implement:"
echo " - src/include/primer/count_min_sketch.h"
echo " - src/primer/count_min_sketch.cpp"
There was a problem hiding this comment.
And the evaluation script to be
#!/bin/bash
set -e
cd /workspace
# Verify test file wasn't modified
echo "Verifying protected files were not modified"
if ! sha256sum -c /tmp/checksums/test.sha256 > /dev/null 2>&1; then
echo "FAIL: test/primer/count_min_sketch_test.cpp was modified"
exit 1
fi
echo "Protected files unchanged"
# Build
echo ""
echo "=== Building ==="
rm -rf build
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Debug .. > /dev/null 2>&1
if ! make -j$(nproc); then
echo "FAIL: Build failed"
exit 1
fi
# Run tests
echo ""
echo "=== Running Tests ==="
make -j$(nproc) count_min_sketch_test > /dev/null 2>&1
if ! ./test/count_min_sketch_test; then
echo "FAIL: Tests failed"
exit 1
fi
# Format check
echo ""
echo "=== Format Check ==="
make format > /dev/null 2>&1
if ! make check-clang-tidy-p0; then
echo "FAIL: clang-tidy check failed"
exit 1
fi
echo ""
echo "PASS: All checks passed"
exit 0
There is no need to have scoring scheme since we just report pass/fail. What do you think?
Thank you Tarek! I will add more tests to this PR to scale it up~ |
|
@Jackcuii For tests, do you mean that we can have more? Can you please add more tests as soon as possible? We need to merge this PR. Thanks a lot. |
Hi Xuan! Yes we can have more! Sorry for being late, I am heading back home these days. I will push hard after I arrive home on 4😃. I possibly need to change the workflow of test to a 'consecutive test' which means I need to run the all 4 tests left continuously. That is because the lab2,3,4 of 15-445 needs to be based on the last lab. However, we do not have golden version of the project. So we need to make the agent consecutively work on the 4 labs in one go. |
This is a Draft PR
Description
This PR adds CMU 15-445 Lab 0 (Count-min Sketch) to the Benchmark Suite. The task requires implementing a thread-safe Count-min sketch data structure, a probabilistic data structure used for frequency estimation in streaming data. This lab focuses on C++ programming, concurrency, algorithms, and database systems concepts.
Changes
data/cmu_15-445/task_cpp/with complete lab setupTesting
E2E Tested with Claude Haiku
TODOs